Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness
نویسندگان
چکیده
This paper presents a novel framework for human action recognition based on sparse coding. We introduce an effective coding scheme to aggregate low-level descriptors into the super descriptor vector (SDV). In order to incorporate the spatio-temporal information, we propose a novel approach of super location vector (SLV) to model the space-time locations of local interest points in a much more compact way compared to the spatio-temporal pyramid representations. SDV and SLV are in the end combined as the super sparse coding vector (SSCV) which jointly models the motion, appearance, and location cues. This representation is computationally efficient and yields superior performance while using linear classifiers. In the extensive experiments, our approach significantly outperforms the state-of-the-art results on the two public benchmark datasets, i.e., HMDB51 and YouTube.
منابع مشابه
Efficient Local Feature Encoding for Human Action Recognition with Approximate Sparse Coding
Local spatio-temporal features are popular in the human action recognition task. In practice, they are usually coupled with a feature encoding approach, which helps to obtain the video-level vector representations that can be used in learning and recognition. In this paper, we present an efficient local feature encoding approach, which is called Approximate Sparse Coding (ASC). ASC computes the...
متن کاملHuman Action Recognition Based on 3D Edge Oriented Gradient Histogram of Slide Blocks
In this paper, a new feature called 3D edge oriented gradient histogram of slide blocks is proposed for human action recognition, based on the idea that the slide area of human body edge can be seen as a spatio-temporal silhouette surface when human performing a certain action in video. This feature is processed by defining dense 3D spatio-temporal slide blocks on the spatio-temporal silhouette...
متن کاملAction recognition via spatio-temporal local features: A comprehensive study
Local methods based on spatio-temporal interest points (STIPs) have shown their effectiveness for human action recognition. The bag-of-words (BoW) model has been widely used and dominated in this field. Recently, a large number of techniques based on local features including improved variants of the BoW model, sparse coding (SC), Fisher kernels (FK), vector of locally aggregated descriptors (VL...
متن کاملLearning Linear Dynamical Systems with High-Order Tensor Data for Skeleton based Action Recognition
In recent years, there has been renewed interest in developing methods for skeleton-based human action recognition. A skeleton sequence can be naturally represented as a high-order tensor time series. In this paper, we model and analyze tensor time series with Linear Dynamical System (LDS) which is the most common for encoding spatio-temporal time-series data in various disciplines dut to its r...
متن کاملSpatio-Temporal VLAD Encoding for Human Action Recognition in Videos
Encoding is one of the key factors for building an effective video representation. In the recent works, super vector-based encoding approaches are highlighted as one of the most powerful representation generators. Vector of Locally Aggregated Descriptors (VLAD) is one of the most widely used super vector methods. However, one of the limitations of VLAD encoding is the lack of spatial informatio...
متن کامل